Component Interactions And Data Flow
This document explains how the browser extension, backend API, MCP server, and external services interact to deliver a seamless agentic browsing experience. It focuses on:
Communication patterns between the React UI, background/content scripts, WebSocket client, backend API, and external services
Real-time data synchronization via WebSocket and HTTP fallback
Asynchronous flows for agent execution, tool invocation, and browser automation
Error propagation, state management, and consistency across components
Performance, caching, and fault tolerance strategies
The system is organized into:
Extension (React UI, background script, content script, WebSocket client, utilities)
Backend API (FastAPI app, routers, services, models)
Agents and tools (LangGraph-based React agent, tool registry)
MCP server (external service integration)
Side Panel UI"] WS["websocket-client.ts
WebSocket Client"] BG["background.ts
Background Script"] CT["content.ts
Content Script"] EXA["executeAgent.ts
HTTP Execution"] EXB["executeActions.ts
Browser Actions"] PARSE["parseAgentCommand.ts
Command Parser"] end subgraph "Backend API" API["main.py
FastAPI App"] RUN["run.py
Uvicorn Runner"] end subgraph "Agents & Tools" REACT["react_agent.py
LangGraph Agent"] end UI --> WS UI --> EXA UI --> PARSE WS --> API EXA --> API BG --> CT EXB --> BG API --> REACT
Diagram sources
Section sources
WebSocket client encapsulates connection lifecycle, event handling, and agent execution over WebSocket with automatic reconnection.
Side panel UI manages user input, command parsing, progress updates, and browser storage-backed sessions.
Background script handles cross-tab automation, content script injection, and runtime messaging.
Content script provides page context and executes DOM-level actions.
HTTP execution utility builds payloads, captures page context, and invokes backend endpoints.
Backend API exposes routers for agents, tools, and services; runs on Uvicorn.
React agent orchestrates LLM reasoning and tool execution via LangGraph.
Section sources
The system supports two primary execution modes:
Real-time via WebSocket: UI emits commands; backend streams progress and returns results.
HTTP fallback: UI sends commands via HTTP; backend responds with completion.
Diagram sources
WebSocket Client and Real-Time Execution#
Establishes persistent connections with automatic reconnection and transport fallback.
Emits connection status and generation progress events.
Provides executeAgent, stopAgent, and stats APIs with timeouts and cleanup.
Diagram sources
Section sources
Side Panel UI and Command Parsing#
Parses slash commands into agent/action endpoints and validates availability.
Manages sessions, voice input, file attachments, and mention menus.
Streams progress updates and renders Markdown responses.
Diagram sources
Section sources
HTTP Execution Pipeline#
Resolves active or mentioned tab context, captures client HTML, normalizes URLs, and constructs payloads.
Supports special endpoints (React agent, YouTube, website, GitHub, PyJIIT, skills).
Uses GET/POST based on endpoint and returns structured responses.
Diagram sources
Section sources
Browser Automation and Content Scripts#
Background script injects content scripts and routes actions to active tabs.
Content script performs DOM-level actions (click, type, scroll) and page info extraction.
Action executor translates agent action plans into tab messages.
Diagram sources
Section sources
Backend API and Agent Orchestration#
FastAPI app aggregates routers for agents, tools, and services.
Uvicorn runner starts the server on a configurable host/port.
React agent compiles a LangGraph workflow and executes tool calls asynchronously.
Diagram sources
Section sources
UI depends on WebSocket client and HTTP execution utility.
Background script depends on content script and browser APIs.
HTTP execution utility depends on browser storage, tabs, and scripting APIs.
Backend API depends on routers and the React agent.
React agent depends on LLM and tool registry.
Diagram sources
Section sources
WebSocket streaming reduces latency for long-running agent executions; HTTP fallback ensures reliability when WebSocket is unavailable.
Payload construction captures minimal client HTML and limits DOM introspection to reduce overhead.
Action executor introduces small delays between actions to prevent race conditions and improve stability.
Caching: React agent graph is cached via LRU to avoid recompilation costs.
Recommendations:
Prefer WebSocket for interactive sessions; degrade gracefully to HTTP.
Limit DOM capture size and scope; avoid unnecessary reflows.
Batch browser actions and debounce UI updates.
[No sources needed since this section provides general guidance]
WebSocket connectivity:
Monitor connection_status events and fallback to HTTP when disconnected.
Use getStats with timeout to detect backend responsiveness.
Command parsing:
Ensure slash commands are complete; partial suggestions guide users.
Browser automation:
Verify content script injection and tab permissions.
Handle unknown action types and timeouts during navigation/reload.
HTTP errors:
Normalize error messages for rate limits, gateway errors, and service unavailability.
Storage and sessions:
Persist sessions in browser storage; migrate legacy chat history if needed.
Section sources
The system integrates a React-based UI, background/content scripts, WebSocket streaming, and a FastAPI backend with a LangGraph-powered agent. It balances real-time responsiveness with robust HTTP fallback, manages state across browser storage and UI components, and provides clear error propagation and recovery paths. By leveraging caching, minimal payload construction, and cautious automation, it maintains performance and reliability across distributed components.